Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize vulnerability host counts #24914

Merged
merged 9 commits into from
Jan 13, 2025
Merged

Optimize vulnerability host counts #24914

merged 9 commits into from
Jan 13, 2025

Conversation

mostlikelee
Copy link
Contributor

@mostlikelee mostlikelee commented Dec 19, 2024

#22364

Batching selects aggregating host counts based on CVE and adding concurrency.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.
  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)
  • Added/updated tests
  • Manual QA for all new/changed functionality

@mostlikelee mostlikelee requested a review from a team as a code owner December 19, 2024 18:10
Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 81.16883% with 29 lines in your changes missing coverage. Please review.

Project coverage is 63.55%. Comparing base (9a768ac) to head (2258ee2).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
server/datastore/mysql/vulnerabilities.go 82.06% 19 Missing and 7 partials ⚠️
cmd/fleet/cron.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #24914      +/-   ##
==========================================
- Coverage   63.55%   63.55%   -0.01%     
==========================================
  Files        1618     1618              
  Lines      154469   154562      +93     
  Branches     4037     4037              
==========================================
+ Hits        98180    98238      +58     
- Misses      48553    48581      +28     
- Partials     7736     7743       +7     
Flag Coverage Δ
backend 64.40% <81.16%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@getvictor
Copy link
Member

In my opinion, we should not use concurrent batches for this DB access because it will put an extra load on the DB reader.

We know that the vulnerability cron uses a lot of CPU/DB resources, so the goal should be to smooth out the performance spikes. One way to do that is to pause for some time (500ms?) between each batch.

@iansltx
Copy link
Member

iansltx commented Dec 19, 2024

Counterpoint to the above: smaller batches should help DB load massively, and we can expose an override for concurrency as an env var (with a default that we've confirmed as working in load test) so customers can tune how hard the tool hits the DB. My current guess is that this will be significantly lighter on the DB in load test, even concurrently, than the old massive temp table method (and I think we'll get useful info from loadtest here), and with the env var in place we can tune things easily enough once this hits production workloads.

@iansltx
Copy link
Member

iansltx commented Dec 19, 2024

Also, given that we're talking about 5 concurrent sets of queries, if someone is going over prepared statement maximums based on these changes:

  1. They were running too hot to begin with
  2. They can adjust concurrency down via the proposed env var

@iansltx
Copy link
Member

iansltx commented Dec 20, 2024

Started a Slack thread about which data sets to use to test this.

@mostlikelee mostlikelee marked this pull request as draft December 23, 2024 18:11
@mostlikelee mostlikelee marked this pull request as ready for review December 23, 2024 20:53
Copy link
Member

@iansltx iansltx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only feedback here is on the routines naming. I also see where the host count is bugged but I'll fix that in a PR stacked on top of this one.

@@ -1257,6 +1258,11 @@ func (man Manager) addConfigs() {
false,
"Don't sync installed Windows updates nor perform Windows OS vulnerability processing.",
)
man.addConfigInt(
"vulnerabilities.max_routines",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"vulnerabilities.max_routines",
"vulnerabilities.max_concurrency",

Seems like "concurrency" is more self-evident here. Guessing you cycled through that as a naming idea here, so it'd be useful to understand why this naming convention won.

@lukeheath
Copy link
Member

@mostlikelee Long-running PR alert!

Copy link
Member

@iansltx iansltx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in approving this, vs. when this commit landed.

@mostlikelee mostlikelee merged commit 80f503a into main Jan 13, 2025
35 checks passed
@mostlikelee mostlikelee deleted the 22364-vuln-counts branch January 13, 2025 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants